Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core][VL] Add random parquet data generator and ShuffleWriterFuzzerTest #3584

Merged
merged 2 commits into from
Jan 15, 2024

Conversation

marin-ma
Copy link
Contributor

@marin-ma marin-ma commented Nov 1, 2023

Add ShuffleWriterFuzzerTest which utilize RandomParquetDataGenerator to generate random schema, input batch size and data for shuffle. This test aims to evaluate the shuffle module's accuracy and spill behavior. Developers should first pass C++ unit tests and then manually execute this specific test after any modifications that could impact the shuffle module.

By default, each test will run with 10 iterations. Failed iterations will be printed as Failed to run test 'testname' with seed: xxxxx, .... Developers can pick up the SQL query corresponding to the test name as well as the seeds from log and modify the reproduce test to reproduce. It's recommended to build cpp as Debug build type before running this test.
If any iterations are failed because of OOM, they will be printed as error log Out of memory while running test 'testname' with seed: xxxxx, .... Iterations with OOM won't fail the test case.

ShuffleWriterFuzzerTest is tagged as SkipTestTags so it won't be run in CI.

@marin-ma marin-ma changed the title [Core][VL] Add random parquet data generator and ShuffleWriterFuzzerTest [Core][VL] (WIP) Add random parquet data generator and ShuffleWriterFuzzerTest Nov 1, 2023
Copy link

github-actions bot commented Nov 1, 2023

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/oap-project/gluten/issues

Then could you also rename commit message and pull request title in the following format?

[GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message}

See also:

Copy link

github-actions bot commented Nov 1, 2023

Run Gluten Clickhouse CI

Copy link

github-actions bot commented Nov 1, 2023

Run Gluten Clickhouse CI

Copy link

github-actions bot commented Nov 2, 2023

Run Gluten Clickhouse CI

1 similar comment
Copy link

github-actions bot commented Nov 3, 2023

Run Gluten Clickhouse CI

Copy link

github-actions bot commented Nov 6, 2023

Run Gluten Clickhouse CI

Copy link

This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days.

@github-actions github-actions bot added the stale stale label Dec 22, 2023
@github-actions github-actions bot removed the stale stale label Dec 30, 2023
Copy link

Run Gluten Clickhouse CI

@marin-ma marin-ma changed the title [Core][VL] (WIP) Add random parquet data generator and ShuffleWriterFuzzerTest [Core][VL] Add random parquet data generator and ShuffleWriterFuzzerTest Jan 10, 2024
Copy link

Run Gluten Clickhouse CI

@marin-ma
Copy link
Contributor Author

@zhouyuan Could you help to review? Thanks!

@zhouyuan
Copy link
Contributor

@marin-ma it seems there are several unit tests failed, I guess it's due to some gaps w/ main branch, can you please do a rebase to check again?

-yuan

@marin-ma
Copy link
Contributor Author

@marin-ma it seems there are several unit tests failed, I guess it's due to some gaps w/ main branch, can you please do a rebase to check again?

-yuan

@zhouyuan The PR itself needs update. I will fix it ASAP. Thanks!

Copy link

Run Gluten Clickhouse CI

Copy link
Contributor

@zhouyuan zhouyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍
this seems a standalone component, looks good to me

@marin-ma marin-ma merged commit eb35adb into apache:main Jan 15, 2024
20 checks passed
@GlutenPerfBot
Copy link
Contributor

===== Performance report for TPCH SF2000 with Velox backend, for reference only ====

query log/native_3584_time.csv log/native_master_01_14_2024_00db209eb_time.csv difference percentage
q1 33.44 33.95 0.501 101.50%
q2 24.71 24.88 0.164 100.66%
q3 37.85 38.01 0.167 100.44%
q4 38.45 38.25 -0.201 99.48%
q5 71.92 71.46 -0.460 99.36%
q6 8.39 7.30 -1.088 87.03%
q7 84.06 85.49 1.430 101.70%
q8 85.31 85.37 0.062 100.07%
q9 125.48 122.58 -2.900 97.69%
q10 45.82 44.98 -0.846 98.15%
q11 20.23 19.89 -0.344 98.30%
q12 28.89 28.31 -0.577 98.00%
q13 47.09 47.25 0.159 100.34%
q14 18.32 19.45 1.131 106.18%
q15 30.94 29.55 -1.389 95.51%
q16 14.42 14.42 0.004 100.02%
q17 102.74 102.43 -0.309 99.70%
q18 148.25 148.49 0.241 100.16%
q19 13.52 13.56 0.035 100.26%
q20 27.27 27.19 -0.083 99.70%
q21 226.68 228.47 1.789 100.79%
q22 13.85 13.86 0.009 100.06%
total 1247.65 1245.14 -2.506 99.80%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants